Sampling-Based Speech Parameter Generation Using Moment-Matching Networks

نویسندگان

Shinnosuke Takamichi

Tomoki Koriyama

Hiroshi Saruwatari

چکیده

This paper presents sampling-based speech parameter generation using moment-matching networks for Deep Neural Network (DNN)-based speech synthesis. Although people never produce exactly the same speech even if we try to express the same linguistic and para-linguistic information, typical statistical speech synthesis produces completely the same speech, i.e., there is no inter-utterance variation in synthetic speech. To give synthetic speech natural inter-utterance variation, this paper builds DNN acoustic models that make it possible to randomly sample speech parameters. The DNNs are trained so that they make the moments of generated speech parameters close to those of natural speech parameters. Since the variation of speech parameters is compressed into a low-dimensional simple prior noise vector, our algorithm has lower computation cost than direct sampling of speech parameters. As the first step towards generating synthetic speech that has natural inter-utterance variation, this paper investigates whether or not the proposed sampling-based generation deteriorates synthetic speech quality. In evaluation, we compare speech quality of conventional maximum likelihood-based generation and proposed sampling-based generation. The result demonstrates the proposed generation causes no degradation in speech quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه روشی برای حذف خطای نوارشدگی در تصاویر سنجنده های آرایه خطی

In this paper a destriping technique is proposed for pushbroom-type satellite imaging systems. This technique is based on Moment Matching algorithm and assumes a linear response for each detector of imaging system. In the most of the works in this field, the offset parameter in detectors’ response is neglected and images are corrected using only an estimation of gain parameter. Proposed m...

متن کامل

Speech Enhancement using Adaptive Data-Based Dictionary Learning

In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...

متن کامل

Prediction of Gain in LD-CELP Using Hybrid Genetic/PSO-Neural Models

In this paper, the gain in LD-CELP speech coding algorithm is predicted using three neural models, that are equipped by genetic and particle swarm optimization (PSO) algorithms to optimize the structure and parameters of neural networks. Elman, multi-layer perceptron (MLP) and fuzzy ARTMAP are the candidate neural models. The optimized number of nodes in the first and second hidden layers of El...

متن کامل

Prediction of Gain in LD-CELP Using Hybrid Genetic/PSO-Neural Models

متن کامل

Phonological Mean Length of Utterance in 48-60-Month-old Persian-speaking Children with Isfahani Accent: Comparison of Story Generation and Conversation Samples

Objective:Phonological Mean Length of Utterance (PMLU), a quantitative measure for assessment of phonological skills, has been considered in developmental studies as a diagnostic and clinical criterion in phonological development. Moreover, it is an indicator rate of the efficacy of the intervention. The PMLU is a word level measure that can be calculated on the child’s transcribed speech sampl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Sampling-Based Speech Parameter Generation Using Moment-Matching Networks

نویسندگان

چکیده

منابع مشابه

ارائه روشی برای حذف خطای نوارشدگی در تصاویر سنجنده های آرایه خطی

Speech Enhancement using Adaptive Data-Based Dictionary Learning

Prediction of Gain in LD-CELP Using Hybrid Genetic/PSO-Neural Models

Prediction of Gain in LD-CELP Using Hybrid Genetic/PSO-Neural Models

Phonological Mean Length of Utterance in 48-60-Month-old Persian-speaking Children with Isfahani Accent: Comparison of Story Generation and Conversation Samples

عنوان ژورنال:

اشتراک گذاری